Learning Hadoop 2 by Garry Turkington & Gabriele Modena

Learning Hadoop 2 by Garry Turkington & Gabriele Modena

Author:Garry Turkington & Gabriele Modena [Turkington, Garry]
Language: eng
Format: epub, mobi
Publisher: Packt Publishing
Published: 2015-02-12T22:00:00+00:00


Now, it is just a matter of grouping hourly_tweets by hour and then generating a count of tweets per group, as follows:

hourly_tweets_count = FOREACH (GROUP hourly_tweets BY hour) { GENERATE FLATTEN(group), COUNT(hourly_tweets); }

Sessions

DataFu's Sessionize class can help us to better capture user activity over time. A session represents the activity of a user within a given period of time. For instance, we can look at each user's tweet stream at intervals of 15 minutes and measure these sessions to determine both network volumes as well as user activity:

DEFINE Sessionize datafu.pig.sessions.Sessionize('15m'); users_activity = FOREACH tweets { GENERATE CustomFormatToISO($0#'created_at', 'EEE MMMM d HH:mm:ss Z y') AS dt, (chararray)$0#'user'#'id' as user_id; } users_activity_sessionized = FOREACH (GROUP users_activity BY user_id) { ordered = ORDER users_activity BY dt; GENERATE FLATTEN(Sessionize(ordered)) AS (dt, user_id, session_id); }



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.